2023-12-21 08:37:02.AIbase.4.4k
Zhipu AI Open-Source Visual Language Model CogAgent Supports GUI Graphic Interface Q&A
Zhipu AI has open-sourced CogAgent, a visual language model with 18 billion parameters. CogAgent excels in GUI understanding and navigation, achieving state-of-the-art general performance across multiple benchmark tests. The model supports high-resolution visual input and dialog Q&A, and can answer questions based on any GUI screenshot. CogAgent also supports OCR-related tasks, with its capabilities significantly enhanced through pre-training and fine-tuning.